A NEW APPROACH TO THE GIANT COMPONENT 

PROBLEM 



SVANTE JANSON AND MALWINA J. LUCZAK 

Abstract. We study the largest component of a random (multi)graph 
on n vertices with a given degree sequence. We let n <x. Then, under 
some regularity conditions on the degree sequences, we give conditions 
on the asymptotic shape of the degree sequence that imply that with 
high probability all the components are small, and other conditions that 
imply that with high probability there is a giant component and the 
sizes of its vertex and edge sets satisfy a law of large numbers; under 
suitable assumptions these are the only two pos sibilities. In particular, 
we recover the results by MoUoy and Reed [231 : [24j on the size of the 
largest component in a random graph with a given degree sequence. 

We further obtain a new sharp result for the giant component just 
above the threshold, generalizing the case of G{n,p) with np = 1 -\- 
Lj{n)n~^^^ , where u{n) — > oo arbitrarily slowly. 

Our method is based on the properties of empirical distributions of 
independent random variables, and leads to simple proofs. 



1. Introduction 

For many years, questions concerning the size and structure of the largest 
component in a random graph have attracted a lot of attention. There 
have by now been quite a number of studies for the Bernoulli random graph 
G{n,p) with n vertices and edge probability p, and for the uniformly random 
graph G{n,m) with n vertices and m edges (see for instance 0; 17\ and the 



references therein). Further, a number of studies |23l:l24l:ll9l| have considered 
the emergence of a giant component in a random graph with a specified 
degree sequence. In [2^, Molloy and Reed found the threshold for the 
appearance of a giant component in a random graph on n vertices with 
a given degree sequence; in [23], they gave further results including the 
size of this giant component above this critical window. Their strategy 
was to analyse an edge deletion algorithm that finds the components in a 
graph, showing that the corresponding random process is well approximated 
by the solution to a system of differential equations. The proof is rather 
long and complicated, and uses a bound of the order n^/^ on the maximum 
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degree. More recently, Kang and Seierstad [19|] have considered the near- 
critical behaviour of such graphs, once again assuming that, for some e > 0, 
the maximum degree does not exceed n^/^~^. Using singularity analysis of 
generating functions, they determine the size of the giant component very 
close to the critical window, with a gap logarithmic in the number of vertices. 
In this paper, we present a simple solution to the giant component prob- 



lem. Unlike Molloy and Reed [23], |2j], we do not use differential equations. 



but rely solely on the convergence of empirical distributions of independent 
random variables. (We use a variant of the method we used in [1! iE^ to 
study the fc-core of a random graph.) In the super-critical regime, we require 
only conditions on the second moment of the asymptotic degree distribution; 
in the critical regime, we require a fourth moment condition, but we are able 
to go all the way to the critical window, without any logarithmic separa- 
tion. This is striking, as that logarithmic (or even larger) separation is often 
very hard to get rid of, see for instance [ij] in the case of percolation on the 
Cartesian product of two complete graphs on n vertices, or 0] in percolation 
on the n-cube, and also 19(1 for the model analysed in the present paper. 
Like Molloy and Reed [23, l2J] , we work directly in the configuration model 
used to construct the random graph, exposing the edges one by one as they 
are needed. 

We work with random graphs with given vertex degrees. Results for 
some other random graph models, notably for G{n,p) and G{n,m), follow 
immediately by conditioning on the vertex degrees. 

Our method uses a version of the standard exploration of components. 
A commonly used, and very successful, method to study the giant compo- 
nent is to make a branching process approximation of the early stages of 
this exploration, thus focussing on the beginning of the exploration of each 
component and the conditions for not becoming extinct too soon; see e.g. 
Janson, Luczak and Rucihski \17^; Molloy and Reed [Isi]; Kang and Seierstad 
19| and, for some more complicated cases, Britton, Janson and Martin-L6f 
8|. It should be noted that, in contrast, our method focuses on the condition 
:"or ending each exploration. 



2. Notation and results 

To state our results we introduce some notation. For a graph G, let 
v{G) and e{G) denote the numbers of vertices and edges in G, respectively; 
further, let Vk{G) be the number of vertices of degree k, k > 0. 

Let n G N and let (dj)" be a sequence of non-negative integers. We let 
G{n, {di)i) be a random graph with degree sequence {di)i, uniformly chosen 
among all possibilities (tacitly assuming that there is any such graph at all). 

It will be convenient in the proofs below to work with multigraphs, that is 
to allow multiple edges and loops. More precisely, we shall use the following 
standard type of random multigraph: Let n S N and let (dj)" be a sequence 
of non-negative integers such that di is even. We let G*{n, (dj)") be the 
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random multigraph with given deoree sequence {di)i, defined by the configu- 
ration model (see e.g. Bollobas [5(): take a set of di half-edges for eacli vertex 
i, and combine tlie lialf-edges into pairs by a uniformly random matching of 
the set of all half-edges. Note that G*{n, (dj)") does not have exactly the 
uniform distribution over all multigraphs with the given degree sequence; 
there is a weight with a factor 1/j! for every edge of multiplicity j, and 
a factor 1/2 for every loop, see e.g. [fl §1]. However, conditioned on the 
multigraph being a (simple) graph, we obtain G{n, (dj)"), the uniformly 
distributed random graph with the given degree sequence. 

We assume throughout the paper that we are given a sequence (dj)" = 

fo'^ each n G N (or at least for some sequence n — > oo); for nota- 
tional simplicity we will usually not show the dependence on n explicitly. 
We consider asymptotics as n — > oo, and all unspecified limits below are as 
n — > oo. We say that an event holds whp {with high probability), if it holds 
with probability tending to 1 as n ^ oo. We shall use for convergence 
in probability and Op and Op in the standard way (see e.g. Janson, Luczak 
and Rucihski [l3]); for example, if (X„) is a sequence of random variables, 
then Xn = Op(l) means "X„ is bounded in probability" and X„ = Op(l) 
means that Xn — ^ 0. 
We write 

n 

m = m{n) := | di 

i=l 

and 



nk = n-kin) := #{« : di = k}, k> 0; 

thus m is the number of edges and is the number of vertices of degree in 
the random graph G{n, {di)^) (or G*{n, (dj)")). We assume that the given 
{di)i satisfy the following regularity conditions, cf. Molloy and Reed 231: l24i| 
(where similar but not identical conditions are assumed). 

Condition 2.1. For each n, {di)'1 = (rf^"^)" is a sequence of non-negative 
integers such that Yll=i di is even. Furthermore, {pk)^=o is a probability 
distribution independent of n such that 

(i) Uk/n = : di = k}/n ^ pk as n oo, for every k > 0; 

(ii) A := 'EkkPk G (0,oo); 

(iii) E^d^ = 0{n),■ 

(iv) pi>0. 

Let Dn be a random variable defined as the degree of a random (uniformly 
chosen) vertex in G{n, {di)1) or G*{n, {di)^); thus 

P(Z)„ = A;) = Uk/n. (2.1) 

Note that ED„ = n^^ S"=i di = 2m/n. 
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Further, let D be a random variable with the distribution P(Z) = k) = pk- 
Then (i) can be written 

Dn ^ D. (2.2) 

In other words, D describes the asymptotic distribution of the degree of a 
random vertex in G(n, (di)^). Furthermore, (ii) is A = ED G (0, oo), (iv) is 
F{D = 1) > 0, and (iii) can be written 

EDl = 0{l). (2.3) 

Remark 2.2. In particular, (j2.3p implies that the random variables are 
uniformly integrable, and thus Condition I2.]|(I)| in the form ()2.2p . implies 
KDn-^ED, i.e. 

— =n-'j2d^^>^■^ (2-4) 
n ^-^ 

i=l 

see e.g. (ill. Theorems 5.4.2 and 5.5.9]. 
Let 

oo 

5(x) := J^^pfcx*^ = Ex^, (2.5) 

fc=0 

the probability generating function of the probability distribution {pk)'^Q, 
and define further 

oo 

h{x) ■.= xg'{x)=Y,kpkx'', (2.6) 

k=l 

H{x) := Xx^ -h{x). (2.7) 

Note that h{0) = and h{l) = A, and thus H{0) = H{1) = 0. Note also 
that 

H'{1) = 2A - ^ k'^pk = E{2D - D^) = - E D{D - 2). (2.8) 

k 

See further Lemma |5.5[ 

Our first theorem is essentially the main results of Molloy and Reed 23|, 



Theorem 2.3. Suppose that Condition \2. 1\ holds and consider the random 
graph G{n, {di)i), letting n ^ oo. Let Ci and C2 he the largest and second 
largest components of G{n, (di)^). 

(i) IfED{D -2) = Y.k Hf^ - '^)Pk > 0, then there is a unique ^ G (0, 1) 
such that H(^) = 0, or equivalently g'{(,) = A^, and 

v{Ci)/n^l-g{0>0, 



Vk{Ci)/n — > pk{l - for every k>0, 
e(Ci)/n^iA(l-a, 
while v{C2)/n — ^ and e{C2)/n — ^ 0. 
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(ii) IfED{D - 2) = Y.kHk - 'i)Pk < 0, then v{Ci)/n and 



e{Ci)/n^O. 
The same results hold for G*{n, (di)i). 

In the usual, somewhat informal, language, the theorem shows that G{n, (di) 
has a giant component if and only if E.D{D — 2) > 0. 
In the critical case, we can be more precise. 

Theorem 2.4. Suppose that Condition \2.1\ holds and that E,D{D — 2) = 
J2k^(^~'^)Pk ~ Assume further that On := EZ?„(L'„ — 2) = J2i=idi{di~ 
2)/n > and, moreover, n^^^an oo, and that 

n 

Y,4^' = 0{n) (2.9) 

i=l 

for some > 0. Let (3 ■.= ED{D -l){D-2). Then, P > and 

2A 

v{Ci) = ^"-"n + Op{nan), 
2 

Vk{Ci) = -kpkuan + Op{nan), for every k>0, 

e(Ci) = ^"-"n + Op(na„), 

while v{C2) = Op(na„) and e{C2)/n = Op(na„). 
The same results hold for G*{n, (dj)"). 

Remark 2.5. Condition (j2.9p may be written KDn'^'^ < oo; it thus im- 
plies (j2.3p and Condition [2]J l^iii) moreover, it implies that and D"^ are 
uniformly integrable. Hence, using (|2:2]) . (12:9]) implies ED^ ^ ED^ and 
ED^ — > ED'^. In particular, the conditions of Theorem 12.41 imply 

an:=EDn{Dn-2) ^ED{D -2) = (2.10) 

and 

Pn ■.= EDn{Dn " l)(£»n -2)^ED{D- 1){D -2) =(3. (2.11) 

We do not think that the condition (|2.9|) is best possible; we conjecture 
that, in addition to Condition 12.11 it is enough to assume that Dl are 
uniformly integrable, or, equivalently, that ED"^ — > ED^ < oo. 



Condition [2 jl^iii)| and ([TI]) imply that 

liminf P(G*(n, (d^?) is a simple graph) > 0, (2.12) 

see for instance Bollobas pl , McKay [2l[ or McKay and Wormald [12] under 
some extra condition on maxdj and Janson [l3| for the general case. Since 
we obtain G{n, (dj)") by conditioning G*{n, (dj)") on being a simple graph, 
and all results in Theorems 12.31 and 12.41 are (or can be) stated in terms of 
convergence in probability, the results for G{n, (dj)") follow from the results 
for G*{n, {di)i) by this conditioning. 
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We will prove Theorems 12.31 and 12.41 for G*{n, ((ii)") in Sections [5] and El 
The proofs use the same arguments, but we find it convenient to first dis- 
cuss the somewhat simpler case of Theorem 12.31 in detail and then do the 
necessary modifications for Theorem 12.41 



Remark 2.6. The assumption Condition [2Jl|^iii) is used in our proof mainly 
for the reduction to G*(n, (dj)"). In fact, the proof of Theorem 12.31 for 
G*{n, {di)i ) holds with simple modifications also if Condition \2A Ifiii) is re- 
placed by the weaker condition that are uniformly integrable, or equiv- 
alently, see Remark 12.21 EL)„ — > ED or (j2.4p . It might also be possible 
to extend Theorem 12.31 for G{n, (c^j)") too, under some weaker assumption 
that Condition \2A 



111 



by combining estimates of P(G*(n, (dj)") is simple) 
from e.g. McKay and Wormald [2^ with more precise estimates of the error 



probabilities in Section [5l but we have not pursued this. 



Remark 2.7. Condition [271 [iv) excludes the case pi = 0; we comment 
briefly on this case here. Note first that in this case, E 2) = ^^^3 k{k— 

'^)Pk ^ 0, with strict inequality as soon as pk > for some k > 3. 

First, if = and ED{D - 2) > 0, i.e. if = and Y.kyzPk > 0' is 
easily seen (by modifying the proof of Theorem 12.31 below or by adding en 
verticas of degree 1 and applying Theorem I2.3|) that all but Op(n) vertices 
and edges belong to a single giant component. Hence, the conclusions of 
Theorem 12 . I^(i) | hold with = 0. (In this case, H{x) > for every x € (0, 1).) 

The case pi = and E D{D — 2) = 0, i.e. Pk = ^ for all A; 7^ 0, 2, is much 
more exceptional. (In this case, H{x) = for all x.) We give three examples 
showing that quite different behaviours are possible. Since isolated vertices 
do not matter, let us assume po = too and consider thus the case p2 = 1- 

One example is when all = 2, so we are studying a random 2-regular 
graph. In this case, the components are cycles. It is well-known, and easy to 
see, that (for the multigraph version) the distribution of cycle lengths is given 
by the Ewens's sampling formula ESF(l/2), see e.g. Arratia, Barbour and 
Tavare [2], and thus v{Ci)/n converges in distribution to a non-degenerate 
distribution on [0,1] and not to any constant [3, Lemma 5.7]. Moreover, 
the same is true for v{C2)/n (and for v{C3)/n, ■ ■ ■), so in this case there are 
several large components. 

A second case with p2 = 1 is obtained by adding a small number of vertices 
of degree 1. (More precisely, let ni — > 00, ni/n — > 0, and n2 = n — ni.) It 
is then easy to see that v{Ci) = Op(n). 

A third case with p2 = 1 is obtained by instead adding a small number of 
vertices of degree 4 (i.e., — > 00, n^/n — > 0, and n2 = n—n^). By regarding 
each vertex of degree 4 as two vertices of degree 2 that have merged, it is 
easy to see that in this case v{Ci) = n — Op{n), so there is a giant component 
containing almost everything. (The case = again.) 
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3. G{n,p), G{n,m) and other random graphs 

The results above can be applied to some other random graphs models 
too by conditioning on the vertex degrees; this works whenever the random 
graph conditioned on the degree sequence has a uniform distribution over 
all possibilities. Notable examples of such random graphs are G(n, p) and 
G{n,m), and other examples are given in P, Section 16.4], If, 
furthermore. Condition 12.11 and (12. Oh hold in probability (where now di are 
the random vertex degrees), then Theorems 12.31 and 12.41 hold: in the latter, 
we define On '■= Y17=i di{di — 2)/n, which now is random. (For the proof, it 
is convenient to use the Skorohod coupling theorem jl8l . Theorem 4.30] and 
assume that the conditions hold a.s.) 

For example, for G{n,p) with np ^ X or G{n, m) with 2m/n A, where 
< A < oo, the assumptions hold with D ~ Po(A) and thus g{x) = 
h{x) = Xxe^^^~^\ H{x) = \x{x — e'^^^^"'^-') , and we recover the both the 
classical threshold A = 1 and the standard equation ^ = e'^^^~^^ for the size 
of the giant component when A > 1. 

If we consider G{n,p) with p = (1 +£n)/n where e„ ^ in Theorem 12. 4^ 
we have anj^n — ^ 1 by the second moment method as soon as ne„ — >■ oo, 
so we need •n)^l'^en_-^ oo in order to apply Theorem 12.41 On the other hand, 
it is weU known 0; [H; [13] that if n^/^Sn = 0(1), then v{Ci) and v{C2 



.2) are 



both of the same order n^/^ and Theorem 12.41 fails, which shows that the 
condition v}/^an ^ oo in Theorem 12.41 is best possible. 



4. Finding the largest component 

The components of an arbitrary finite graph or multigraph can be found 
by the following standard procedure. Pick an arbitrary vertex v and deter- 
mine the component of v as follows: include all the neighbours of v in an 
arbitrary order; then add in the neighbours of neighbours, and so on, until 
no more vertices can be added. The vertices included until this moment 
form the component of v. If there are still vertices left in the graph, pick 
any such vertex and repeat the above to determine the second compo- 
nent (the component of vertex w). Carry on in this manner until all the 
components have been found. 

It is clear that we obtain the same result as follows. Regard each edge 
as consisting of two half-edges, each half-edge having one endpoint. We 
will label the vertices as sleeping or awake (= used) and the half-edges as 
sleeping^ active or dead; the sleeping and active half-edges are also called 
living. We start with all vertices and half-edges sleeping. Pick a vertex and 
label its half-edges as active. Then take any active half-edge, say x and find 
its partner y in the graph; label these two half-edges as dead; further, if the 
endpoint of y is sleeping, label it as awake and all other half-edges there as 
active. Repeat as long as there is any active half-edge. 
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When there is no active half-edge left, we have obtained the first compo- 
nent. Then start again with another vertex until all components are found. 

We apply this algorithm to a random multigraph G*{n, (dj)") with a given 
degree sequence, revealing its edges during the process. We thus observe 
initially only the vertex degrees and the half-edges, but not how they are 
joined to form edges. Hence, each time we need a partner of an half-edge, it 
is uniformly distributed over all other living half-edges. (The dead half-edges 
are the ones that already are paired into edges.) We make these random 
choices by giving the half-edges i.i.d. random maximal lifetimes with the 
distribution Exp(l); in other words, each half-edge dies spontaneously with 
rate 1 (unless killed earlier). Each time we need to find the partner of a 
half-edge x, we then wait until the next living half-edge 7^ x dies and take 
that one. We then can formulate an algorithm, constructing G*{n,{di)'1) 
and exploring its components simultaneously, as follows. Recall that we 
start with all vertices and half-edges sleeping. 

CI If there is no active half-edge (as in the beginning), select a sleeping 
vertex and declare it awake and all its half-edges active. For defi- 
niteness, we choose the vertex by choosing a half-edge uniformly at 
random among all sleeping half-edges. If there is no sleeping half- 
edge left, the process stops; the remaining sleeping vertices are all 
isolated and we have explored all other components. 

C2 Pick an active half-edge (which one does not matter) and kill it, i.e., 
change its status to dead. 

C3 Wait until the next half-edge dies (spontaneously). This half-edge is 
joined to the one killed in the previous step lC2l to form an edge of the 
graph. If the vertex it belongs to is sleeping, we change this vertex 
to awake and all other half-edges there to active. Repeat from lCli 

The components are created between the successive times Ol is performed; 
the vertices in the component created during one of these intervals are the 
vertices that are awakened during the interval. Note also that a component 
is completed and ICll is performed exactly when the number of active half- 
edges is and a half-edge dies at a vertex where all other half-edges (if any) 
are dead. 

5. Analysis of the algorithm for G*(n, (dj)") 

Let S{t) and A{t) be the numbers of sleeping and active half-edges, re- 
spectively, at time t, and let L{t) = S{t) + A{t) be the number of living 
half-edges. As is customary, and for definiteness, we define these random 
functions to be right-continuous. 

Let us first look at L{t). We start with 2m half-edges, all sleeping and 
thus living, but we immediately perform ICll and IC2I and kill one of them; 
thus L(0) = 2m — 1. In the sequel, as soon as a living half-edge dies, we 
perform I C3I and then (instantly) either [C2] or both lCll and lC2l Since fCll does 
not change the number of living half-edges while IC2I and IC3I each decrease 
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it by 1, the total result is that L{t) is decreased by 2 each time one of the 
living half-edges dies, except when the last living one dies and the process 
terminates. 

Lemma 5.1. As n ^ oo, 

sup\n-^L{t) - Xe-^^l ^0. 

Proof. This (or rather an equivalent statement in a slightly different situa- 
tion) was proved in [l^ as a consequence of the Glivenko-Cantelli theorem 
[l^ . Proposition 4.24] on convergence of empirical distribution functions. It 
also follows easily from (the proof of) Lemma 16.21 below if we replace a„ by 
1. □ 



Next consider the sleeping half-edges. Let 14 (t) be the number of sleeping 
vertices of degree k at time t; thus 



S{t) = J2kVkit). 

k=l 

Note that IC2I does not affect sleeping half-edges, and that IC3I implies that 
each sleeping vertex of degree k is eliminated (i.e., awakened) with intensity 
k, independently of all other vertices. There are also some sleeping vertices 
eliminated by ICli 

We first ignore the effect of ICll by letting Vk{t) be the number of vertices 
of degree k such that all its half-edges have maximal lifetimes > t. (I.e., 
none of its k half-edges would have died spontaneously up to time t, assuming 
they all escaped EH) Let further S{t) := J2k ^^k{t)- 



Lemma 5.2. As n ^ oo, 



sup In Vfc(t) -Pke 
t>o 



-kt\ 



for every k > and 



sup 

t>o 



n 



-'J2^k{t)-gie-') 



k=0 



sup|n~^5(t) - h{e-^)\ 



0. 



(5.1) 

(5.2) 
(5.3) 



Proof. The statement (15. ip . again, follows from the Glivenko-Cantelli the- 
orem, see (l5| . or from the proof of Lemma 16.31 below. (The case = is 
trivial, with Vo(t) = uq for all t.) 

By Remark [221 D n are uniformly integrable, which means that for every 
e > there exists i^T < oo such that for all n Ylk>K kuk/n = E(L>„; Dn > 
K) < e. We may further assume (or deduce by Fatou's inequality) ^k>K Pk < 
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e, and obtain by (15. ip whp 



supln ^S{t) — h{e *)|=sup 
t>o t>o 



J2k{n-'Vk{t)-pke-''') 



k=l 



K 

<y2ksuv\n~^Vk{t) - pke-^'\ + V k(—+pk 
k=i k>K 

<e + e + e, 

proving (j5.3p . An almost identical argument yields (j5.2p . □ 
The difference between S{t) and S{t) is easily estimated. 

Lemma 5.3. // dmax := maxj (ij is the maximum degree of G*{n,{di)i), 
then 

< S{t) - S{t) < sup {S{S) - L{s)) + dn,ax. 

0<s<t 

Proof. Clearly, Vfc(t) < Vk{t), and thus S{t) < S{t); furthermore, S{t)-S{t) 
increases only as a result of ICll which acts to guarantee that A{t) = L{t) — 
S{t) > 0. 

If lCll is performed at time t and a vertex of degree j > is awakened, then 
IC2I applies instantly and we have A{t) = j — 1 < dmaxj and consequently 

S{t) - S{t) = S{t) - L(t) + A{t) < S{t) - L{t) + d^ax. (5.4) 

Furthermore, S{t) — S{t) is never changed by IC2I and either unchanged or 
decreased by IC3I Hence, S{t) — S{t) does not increase until the next time 
ICll is performed. Consequently, for any time t, if s was the last time before 
(or equal to) t that ICll was performed, then S{t) — S{t) < S{s) — S{s), and 
the result follows by (15. 4p . □ 



Let _ _ 

A{t) := L{t) - S{t) = A{t) - {S{t) - S{t)) . (5.5) 

Then, by Lemmas O and O and ^2l\ . 

snp\n~^A{t) - i/(e-*)| 0. (5.6) 
t>o 

Lemma 15.31 can be written 

< S{t) - S{t) < - inf A{s) + d„,ax. (5.7) 

s<t 

Remark 5.4. By (|5.5p and (j5.7p . we obtain further the relation 

A{t) < A{t) < A{t) - inf I(s) + dniax 
s<t 

which, perhaps, illuminates the relation between A(t) and A{t). 

Lemma 5.5. Suppose that Condition \2.1\ holds and let H{x) he given by 
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(i) // E D{D -2) =Ylk - 2)pk > 0, then there is a unique ^ G (0, 1) 
such that -ff(^) = 0; moreover, H(x) < for x G (0, ^) and H{x) > 
for X G 1). 

(ii) // ED{D -2) = Y.k - 2)pk < 0, then H{x) < for x e (0, 1). 

Proof As remarked earlier, H{0) = H{1) = and H'{1) = - ED{D - 2). 
Furthermore, if we define (p{x) := H{x)/x, then ip{x) = Xx — kpkX^~^ is 
a concave function on (0, 1], and it is strictly concave unless pk = for all 
/c > 3, in which case H'{1) = - ED{D - 2) = pi > 0. 



In case (ii) , we thus have (p concave and </?'(!) = H'{1) — H{1) > 0, 
with either the concavity or the inequality strict, and thus (p'{x) > for all 
X G (0, 1), whence ip{x) < ip{l) = for x G (0, 1). 



In case (i) , H'{1) < 0, and thus H{x) > for x close to 1. Further, 
H'{0) = —h'{0) = —pi < 0, and thus H{x) < for x close to 0. Hence there 
is at least one ^ G (0, 1) with H{^) = 0, and since H{x)/x is strictly concave 
and also H[l) = 0, there is at most one such ^ and the result follows. □ 



Proof of Theorem \ 2. 3l(^i)\ Let ^ be the zero of H given by Lemma l5 . E|(i) | and 
let T := — In^. Then, by Lemma 15.51 H{e~*) > for < t < t, and thus 
infj<T- f/'(e"*) = 0. Consequently, (15. 6p implies 

n"^ inf A{t) = inf n-^A{t) - inf F(e"*) 0. (5.8) 

t<T t<T t<T 



Further, by Condition [2J 
Consequently, (|5.7p and (|5.8 



ni. 



C^max = 0(n^/2), and thus n ^dmax 0. 

yield 

supn-^A{t) - A{t)\ = supn-^l^(t) - S{t)\ ^ (5.9) 

t<T t<T 

and thus, by (15. 6p . 

sup\n-^Ait) - H{e-^)\ 0. (5.10) 

t<T 

Let < e < r/2. Since i/(e~*) > on the compact interval [e, r — e], 
()5.10p implies that whp A{t) remains positive on [e, t — e], and thus no new 
component is started during this interval. 

On the other hand, again by Lemma I5.^(i)| H[e^'^^^) < and (|5.6p 
implies n"M(r + e) H{e~'''^), while A{t + e) > 0. Thus, with 6 := 
\Hie-^~')\/2 > 0, whp 

S{t + e)- S{t + e) = A{t + s) - A{t + e) > -I(t + e) > nd, (5.11) 

while (15. 9p yields S{t) — S{t) < n6 whp. Consequently, whp 5(t + e) — 
S{t + e) > S{t) — S{t), so lCll is performed between r and r + e. 

Let Ti be the last time lCll was performed before r/2 and let T2 be the next 
time it is performed. We have shown that for any e > 0, whp < Ti < e 
and r — e < T2 < r + e; in other words, Ti and T2 r. 

We state the next step as a lemma that we will reuse. 



12 SVANTE JANSON AND MALWINA J. LUCZAK 

Lemma 5.6. Let and T2 be two (random) times w/ten lCTl are performed, 
with < T2 , and assume that — ^ ti and T| — ^ t2 where < ti < 
t2 ^ T. If C* is the union of all components explored between T* and , 
then 

Vk{C*)/n^Pk{e-''' -e-'''^), k>0, (5.12) 
y(^C*)/n^g{e~'^)-gie-''), (5.13) 
e{C*)/n i/i(e-*^) - ^h{e-'^). (5.14) 

In particular, if ti = t2, then v{C*)/n — and e{C*)/n — 0. 

Proof. C* contains all vertices awakened in the interval [Tj^jTg) and no 
others, and thus 

Vk{C*) = VkiT^-) - Vk{T^-), k>l. (5.15) 

Since — ^ t2 <t and H is continuous, voIkt* II{t) — ^ m.it<t2 H{t) = 

0, and (|5.6p and ()5.7p imply, in analogy with (j5.8p and (j5.9p . n^^ inft<2-| A[t) — 
and 

supn-i|S(t)-S(t)| ^0. (5.16) 

t<T2* 

Since Vj{t) > Vj(i) for every j and t > 0, 

Mt) - Vk{t) <k-^Y.^i^i^^) - ^M = ^~\s{t) - s{t^), k>i. 

(5.17) 

Hence (|5.16p implies, for every k>l, sup^<'p. \Vk{t) — Vk{t)\ = Op(n). This 
is further trivially true for A; = too. Consequently, using Lemma 15.21 for 
J = 1,2, 

VkiT*-) = Vfc(T;-) + Op(n) = npkc-^'^' +o^{n) = np^e-^'^ +o^{n), (5.18) 

and (lEnH follows by (ISTSll . Similarly, using Efelo(^fc(*) " ^A;(i)) < S(t) - 
S{t), 



k=l k=l 

= ng{e~'^^) - ng{e~''^^) + Op(n) 



and 

00 00 
2e{C*) = J2k{Vk{T*-) - Vk{T*-)) =Y^k{Vk{T*-) - Vk{T*-)) + o^{n) 

k=l k=l 

= ?7,/i(e"^i ) - n/i(e~'^2 ) + Op(n), 
and (f5l3]) and (f5l4]) follow. □ 



A NEW APPROACH TO THE GIANT COMPONENT PROBLEM 



13 



Let C be the component created at Ti and explored until T2. By LemmaEU 
with ti = and t2 = t, 

v,,{C')/n^Pk{l-e-'^), (5.19) 

v{C')/n ^ g{l) - g{e~n = 1 - g{0, (5-20) 

e(C')/n ^ i(Ml) - He-n) = M^l) " MO) = ^(1 " f), (5-21) 

using dlZI) and H{1) = i?(0 = 0. 

We have found one large component C with the claimed numbers of ver- 
tices and edges. It remains to show that there is whp no other large compo- 
nent. Therefore, let T3 be the first time after T2 that lCll is performed. Since 
S{t) — S{t) increases by at most dmax = Op{n) each time ICll is performed, 
we obtain from (|5.16p that 

sup(S(t) - S{t)) < snp{S{t) - S{t)) + d^ax = Op(n). 

t<T3 t<T2 

Comparing this to (|5.1ip we see that for every e > 0, whp t + e > T^. 
Since also > T2 — ^ r, it follows that — ^ r. If C" is the component 
created between T2 and Ta, then Lemma [5.61 applied to T2 and T3 yields 
v{C")/n and e{C")/n 0. 

Next, let T/ > 0. Applying Lemma 15.61 to Tq := and Ti, we see that the 
total number of vertices and edges in all components found before C, i.e., 
before Ti, is Op(n), because Ti — ^ 0. Hence, recalling m = 0(n) by ()2.4p . 

P(a component C with e(C) > rjim is found before C') — > 0. (5.22) 

On the other hand, conditioning on the final graph G*{n, (c?j)i) that is con- 
structed by the algorithm, if there exists a component C / C in G*{n, {di)i) 
with at least rjm edges that has not been found before C , then with prob- 
ability at least rj, the vertex chosen at random by ICll at T2 starting the 
component C" belongs to C, and thus C = C" . Consequently, 

P(a component C with e(C) > rjm is found after C') 

< Tj-^ P(e(C") > r?m) ^ 0. (5.23) 

Combining (|5.22p and (|5.2c{p . we see that whp there is no component except 
C with at least rjm edges. Taking rj small, this and (I5.21|) show that whp 
C = Ci, the largest component, and further e{C2) < rjm. Consequently, the 
results for Ci follow from (j5.19p ~ (|5.2ip . We have further shown e{C2)/m — ^ 
0, which implies e{C2)/n and v(C2)/n — ^ because m = 0{n) and 
v{C2) < e{C2) + 1. □ 



Proof of Theorem \2.^ \\)\ This is very similar to the last step in the proof 



for (i) Let Ti = and let T2 be the next time ICll is performed. Then 



supU(t) -^(t)| = sup|S'(t) - S'(t)| < 2<ax = o(n). (5.24) 

t<T2 t<T2 
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For every e > 0, we have by (j5.6p and Lemma [5.^(ii)| n ^A{e) — ^ H(e ^) < 
0, while ^(e) > 0, and it follows from (|5.24p that whp T2 < e. Hence, 
T2 — ^ 0. We apply Lemma 15.61 (which holds in this case too, with r = 0) 
and find that if C is the first component, then e{C)/n — ^ 0. 

Let e > 0. If e(Ci) > em, then the probability that the first half-edge 
chosen by I C II belongs to Ci, and thus C = Ci, is 2e(Ci)/(2m) > e, and hence, 
using m = 0(n) by (j2.4p . 

P(e(Ci) > em) < e-^P(e(C) > em) 0. 
The results follows since m = 0{n) and v{Ci) < e(Ci) + 1. □ 

6. Proof of Theorem 12.41 for G*{n,{di)i) 

We assume in this section that the assumptions of Theorem 12.41 hold. 
Note first that /3 := ED{D - 1){D - 2) > with strict inequality unless 
F{D < 2) = 1, but in the latter case a = ED{D - 2) = -pi < 0, which is 
ruled out by the assumptions. 

Define, in analogy with (I2.5p - (l2.7p . 



00 



^-^ n 

k=0 

hn{x) := xg'„{x) = ^ A; 



X 

n 

k=l 



H^x) := ^!!^^2 _ j^^^^^ ^ y ^^(^2 _ ^fc) ^ 



n ^ — ' n 

k=l 



We begin with a general estimate for death processes and use it to prove 
estimates improving Lemmas 15.11 and 15.21 

Lemma 6.1. Let 7 > and d > be fixed. Let N^^\t) be a Markov 
process such that 7V(^)(0) = X a.s. and transitions are made according to 
the following rule: whenever in state y > 0, the process jumps to y — d 
with intensity 7?/; in other words, the waiting time until the next event is 
Exp(l/7y) and each jump is of size d downwards. Then, for every to > 0, 

Esup|iV(^)(t) - e-^'^^xf < 8d{e^'^^° - l)x + 8d^. (6.1) 

t<to 

If x/d is an integer, we also have the better estimate 

Esup|A^(^')(t) - e-^'^*xf < 4(i(e^'^*« - 1)2;. (6.2) 

t<tQ 

Proof. First assume that d = \ and that x is an integer. In this case, the 
process is a standard pure death process taking the values x,x — 1, . . . ,0, 
describing the number of particles alive when the particles die independently 
with rate 7. As is well-known, and easily seen by regarding N^^\t) as the 
sum of X independent copies of the process N^^\t), the process e^^N^^\t), 
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t > 0, is a martingale. Furthermore, for every t > 0, N^^\t) ~ Bi(x,e '''*) 
Hence, by Doob's inequality. 



E sup 

t<to 



xe 



-7t 



< Esup 

t<to 



< 4E 



4e2^*» VariV(^)(to) = 4(e^*o - l)x. 



(6.3) 



Next, still assume d = 1 but let x > be arbitrary. We can couple the 
two processes N^^\t) and N'^^^^^t) with different initial values such that 
whenever the smaller one jumps (by —1), so does the other. This coupling 
keeps |A^(^)(t) - A^(W)(t)| < 1 for ah t > 0, and thus. 



sup 

t<to 



and hence by (j6.3p 



xe 



-7t 



< sup 

t<to 



+ 2 



Esup 

t<to 



xe 



-It 



< 



l)x + 8. 



(6.4) 



Finally, for a general d > we observe that N^'^\t)/d is a process of the 
same type with the parameters (7,(i, x) replaced by (70?, 1, x/d), and the 
general result follows from (j6.4p and (|6.3p . □ 

Lemma 6.2. For every fixed to > 0, as n ^ 00, 

sup |L(t) - 2m{n)e-^^\ = Op{n^/^al/^ + l). 

t<a„to 

Proof. L(t) is a death process as in Lemma 16.11 with 7 = 1, d = 2 and 
X = L(0) = 2m(n) — 1. Hence, by Lemma [6.11 applied to anto, observing 
that Onto = 0{an) = 0(1) and m{n) = 0{n), 

'^^^^ = 0{{e'^''^'^° - l)m{n) + 1) =0{ann + l). □ 



E sup \L{t) — 2m(n)e 

t<Ont0 



Lemma 6.3. For every fixed to > 
sup \Vkit) - nue 

t<a„to 



-kt\ 



for every k >0 and 



sup 

t<a„to 



ngn[e 



Op(nV2ai/2) 



k=0 



sup \S{t) - n/in(e~*)| = Op(n^/^Qy^ + nal) . 

t<a„to 

Proof. Vk{t) is a death process as in Lemma 16.11 with 7 = A;, d = 1 and 
X = nfc. Consequently, by (j6.2p (for A; > 1; the case A; = is trivial). 



E sup \Vk{t) - nue'^^?' < 4(e 



Ijrifc < Akantoe 



ka„to 



t<anto 



.5) 



The estimate for fixed k follows immediately. 
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To treat S{t) = YlT=i f^^kii), we use (16.5P for k < and obtain 
E sup |i4(i) -nfce"''*! < (4toe*''/ca„nfc) k < . 

t<a„to 

For k > a^^ we use the trivial estimate sup^ | Vfc(t) — n,fce~^*| < n^. Summing 
over A; and using the Cauchy-Schwarz inequahty and (|2.9p we find, for some 
C depending on to, 



E sup 

t<anto 



S{t) — kuf^e 



kt 



k=l 



< 



EY^k 

k=l 



sup 

t<arito 



Vkit) - nkC 



-kt 



< C ^ k{kannk] 

k<an^ 



k>an^ 



k>an^ 



k=l 



k=l 



1/2 



(6.6) 



= 0{n^/'^al/'^ +nal). 

The estimate for Xlfc^o ^kit) is proved the same way. 

Lemmas 16.21 and 16.31 implv. cf. (|5.5p and (j5.6p . for every Iq > 0, 

sup|n~^A(a„t) - i7„(e~°"*)| 
t<to 

= a.;;^n"^ sup |L(t) - - ni?„(e~*)| 

= C'p(n"^/2a;;^/^ + n~^a~^ + a„) = Op(l), 

recahing a„ — > by Remark 12.51 and na^ ^ oo. 
Let 

Then JT„(0) = 0, H'M = EDn{Dn - 2) = a„, H'^iO) = EL»„(4 - , 
-EDn{Dn + 2){Dn - 2), and for all t > 0, 

\H';'{t)\ = \E{Dn{8e-^' -Dle''''-))\<E{8Dn + D^^) =0{1). 

Moreover, using Remark 12.51 

H'^{0) = -EDn{Dn + 2){Dn-2) 
-ED{D + 2){D -2) 
= -ED{D - 1){D - 2) - 3ED{D - 2) = -/?. 

Hence a Taylor expansion yields, for t >0, 

Hn{ant) = alt + \H'^{Q)(antf + 0{{antf) =al{t- \l3t^ + o{t^ + t^)) 



□ 
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Consequently, (16. 6p yields, for every fixed to > 0, 

a-2n-^I(a„t) - {t - \l3f) = Op(l). (6.7) 



sup 

t<to 



We now proceed as in the proof of Theorem 12.31 using H{t) := t — \(it^ 
instead of H{e-^). We note that H{t) > for < t < 2//3 and H{t) < for 
t > 2//3; thus we now define r = 2/(3. We obtain from ()6.7p . for any random 



T — > r, 



a„ n inf j4(a„t) — > 



and, using ()5.7|) . since by (j2.9p dmax = o(n-^/^) = o(na^), 
a~'^n~^ sup |^(a„t) — ^(ant)| = a~'^n~^ sup [^(Qnt) — 5(a„t)| — ^ (6.8) 

t<T t<T 

and thus, by ()6.7p again, 

sup [a^"^ n^^ A{ant) — H{t)\ 0. 

t<T 

Taking T = r, it follows as in Section [5] that whp there is a component 
C explored between two random times Ti and T2 with Ti/on — ^ and 
— ^ = 2//3. We have the following analogue of Lemma 15.61 

Lemma 6.4. Let Ti andT2 be two (random) times when \CA\ are performed, 
with Ti < T2, and assume that Ti/un -^-^ ti and T2/an — ^ t2 where 
0<ti<t2<T = 2/p. If C is the union of all components explored between 
Ti and T2, then 

Vk{C) = nankpk{t2 - ti) + Op(nan), A; > 0, 

v{C) = nan\{t2 - h) + Op{nan), 

e{C) = nan\{t2 - h) + Op{nan)- 

In particular, if ti = t2, then v{C) = Op(na„) and e{C) = Op(na„). 

Proof. C consists of the vertices awakened in the interval [Ti,T2), and thus, 
using (j6.8p and Lemma [Oj 

Vk{C) = Ffc(Ti-) - Vk{T2-) = Vk{Ti-) - Vk{T2-) + opinal) 

= n,{e-'^^ - e-'^-) + opinal) 

= nk{kT2 - kTi + Op(a^)) + Op{nal) 

= nkkan{t2 -ti) + Op(na„) 

= kpknan{t2 - ti) + Op(na„). 

Further, since g'^{l) = W.Dn = X and g'^ix) = 0(1) for < x < 1, a 

Taylor expansion yields, for j = 1,2, 

9n{e-^') = 9n{l) - 9'ni^)Tj + 0{Tf) = 1 - Xantj + Op{an). 
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Similarly, since h'^{l) = E = + 2 E ^ 2A, and = 0(1) for 

< X < 1, 

hn{e-^') = K{1) - h'^{l)Tj + 0{Tf) = EZ)„ - 2Aa„tj + Op(a„). 
It now follows from (16. Sp and Lemma 16.31 that 

v{C) = ngnie"'^^) - ngn{e~'^'^) + Op(na^) = na„A(t2 - ^i) + Op(na„), 
2e(C) = n/i„(e~^^) - nhn{e~'^^) + Op(na^) = 2na„A(t2 - ti) + Op(na„). 

□ 

In particular, for the component C found above, with ti = and t2 = r, 
Vk{C') = kpkTnan + Op(na„), (6.9) 
f (C') = Xrnan + Op(na„), (6.10) 
e(C') = Xrnan + Op(na„). (6-11) 

Since r = 2//3, these are the estimates we claim for Ci, and it remains 
only to show that whp all other components are much smaller than C. 

Fix e > with e < t, and say that a component of G*{n, {di)^) is large if 
it has at least emQ„, edges (2ema„ half-edges). Since, by (16. lip and ()2.4p . 
e{C') / {man) — ^ 2r, whp C is large, and further (2r — e)ma„ < e(C') < 
(2r + £)man- Let be the event that e(C') < (2r + e)man and that the 
total number of edges in large components is at least (2r + 2e)ma„. 

It follows by Lemma 16.41 applied to Tq = and Ti that the total number 
of vertices or edges in components found before C is Op(na„). Thus there 
exists a sequence a'^ of constants such that a'^ = o(a„) and whp at most 
na^ vertices are found before Ti, when the first large component is found. 

Let us now condition on the final graph obtained through our component- 
finding algorithm. Given G*{n, (di)i), the components appear in our process 
in the size-biased order (with respect to the number of edges) obtained by 
picking half-edges uniformly at random (with replacement, for simplicity) 
and taking the corresponding components, ignoring every component that 
already has been taken. We have seen that whp this finds component con- 
taining at most na'n vertices before a half-edge in a large component is 
picked. Therefore, starting again at T2, whp we find at most na'n vertices in 
new components before a half-edge is chosen in some large component; this 
half-edge may belong to C, but if Sgr holds, then with probability at least 
£1 := 1 — (2r + e)/(2r + 2e) it does not, and therefore it belongs to a new 
large component. Consequently, with probability at least £1 F{£^) + o{l), the 
algorithm in Section H] finds a second large component at a time T3 , and less 
than na'n vertices between T2 and T3. In this case, let be the time this 
second large component is completed. (If no such second large component 
is found, let for definiteness T3 = T4 = T2.) 

Note that < Viit)-Vi{t) < S{t)-S{t) for all t. Hence, using LemmaO 
and (|6.8p with T = T2/an, the number of vertices of degree 1 found between 
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T2 and Ts is 

Vi{T2-) - Viin-) > Vi(T2-) - {S{T2-) - S{T2-)) - Viin-) 
= nie~'^^ — nie~'^^ + Op(na^). 

Since this is at most na'^ = o{nan), and ni/n — > pi > 0, it follows that 
e~'^2 — e~-^3 = Op(an), and thus T3 = T2 + Op{an) = ran + Op(a„). Hence, 
(j6.8p applies to T = T^/on, and since no ICll is performed between and 
74, 

sup|^(t) - S{t)\ < snp\S{t) - 5(01 +rfmax = Op{na^J. (6.12) 

t<T4 t<T3 

Let to > t; thus H{to) = to - \l3tl < and ([aT]) yields, with 6 = 
\H{to)\/2 > 0, whp A{anto) < -na^6 and thus 

S{anto) - S{anto) = A{anto) - A{anto) > na^5. 

Hence (I6.12p shows that whp T4 < a-nto. Since to > t \s arbitrary, and 
further T2 < < T4 and T2/an t, it follows that T^/an — ^ t and 
Ti/an r. 

Finally, by Lemma 16.41 again, this time applied to T3 and T4, the number 
of edges found between T3 and T4 is Op{nan) = Op(ma„). Hence, whp 
there is no large component found there, although the construction gave 
a large component with probability at least ei + o(l). Consequently, 

eiP(fe) = 0(1) and F{£e) = o{l). 

Recalling the definition of f^, we see that whp the total number of edges 
in large components is at most {2T + 2e)man', since whp at least (2r — e)ma„ 
of these belong to C, there are at most Seman edges, and therefore at most 
3ema„ + 1 vertices, in any other component. 

Choosing e small enough, this shows that whp Ci = C , and further 
v{C2) < e(C2) + 1 < 3eman + 1 < 3Aena„. □ 

7. Conceivable extensions 

It seems to be possible to obtain quantitative versions of our results, such 
as a central limit theorem for the size of the giant component, as we did 
for the fc-core in [16]. (See Pittel [1^ and Barraez, Boucheron and de la 
Vega [3] for G{n,p) and G{n,m), and [2^ for the random cluster model.) 
Similarly, it should be possible to obtain large deviation estimates. 

Further, in the transition window, where an = 0(n^/^), an appropriate 
scaling seems to lead to convergence to Gaussian processes resembling the 
one studied by Aldous |l|, and it seems likely that similar results on the 
distribution of the sizes of the largest components could be obtained. 

We have not attempted above to give more precise bounds on the size 
of the second component C2, and we leave it as an open problem to see 
whether our methods can lead to new insights for this problem. It appears 
that direct analysis of the Markov process {A{t),Vo{t),Vi{t), . . .) can show 
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that the largest component has size O(logn) in the subcritical phase, and 
that so does the second largest component in the supercritical case, but we 
have not pursued this. 

Finally, it seems possible to adapt the methods of this paper to random 
hypergraphs and obtain results similar to those in Behrisch, Coja-Oghlan 
and Kang [1], but we leave this to the reader. 
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